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Abstract — This paper considers rateless network error correc- 
tion codes for reliable multicast in the presence of adversarial 
errors. Most existing network error correction codes are designed 
for a given network capacity and maximum number of errors 
known a priori to the encoder and decoder. However, in certain 
practical settings it may be necessary to operate without such 
a priori knowledge. We present rateless coding schemes for two 
adversarial models, where the source sends more redundancy 
over time, until decoding succeeds. The first model assumes 
there is a secret channel between the source and the destination 
that the adversaries cannot overhear. The rate of the channel 
is negligible compared to the main network. In the second 
model, instead of a secret channel, the source and destination 
share random secrets independent of the input information. The 
amount of secret information required is negligible compared to 
the amount of information sent. Both schemes are optimal in that 
decoding succeeds with high probability when the total amount of 
information received by the sink satisfies the cut set bound with 
respect to the amount of message and error information. The 
schemes are distributed, polynomial-time and end-to-end in that 
other than the source and destination nodes, other intermediate 
nodes carry out classical random linear network coding. 

I. Introduction 

Network coding is a technique that allows the mixing of 
data at intermediate network nodes instead of simply relaying 
them. It has been shown theoretically that such codes are 
capable of achieving multicast network capacity and can be 
implemented in a distributed manner, as well as improving ro- 
bustness against packet losses and link failures 1 1 1, [2], 0, Q. 
However, comparing with pure forwarding of packets, network 
coding is more vulnerable to attack by malicious adversaries 
that inject corrupted packets, since a single corrupted packet 
is mixed with other packets in the network. The use of coding 
to correct such errors information theoretically was introduced 
by 0, (6), and capacity-achieving network error correction 
codes have been proposed for various adversary and network 
models, e.g. Q, (8), (9). However, most existing schemes 
assume a given min cut (capacity) of the network and max- 
imum number of adversarial errors for the purposes of code 
design and encoding. But such an assumption may be overly 
restrictive in many practical settings. For example, in large 
peer-to-peer content distribution networks, estimating network 
capacity is not easy, and the capacity is likely to change over 
time as users join and leave the network. Furthermore, it would 
be even more difficult to decide the number of malicious nodes 
and their strength. This issue becomes more serious if the 



source is multicasting to many destinations, where different 
destinations may require different code constructions to suit 
their own parameters. 

This paper proposes rateless network error correction code s 
that do not require an a priori estimates of the network ca- 
pacity and number of errors. The source transmits redundancy 
incrementally until decoding succeeds. The supply of encoded 
packets is potentially limitless and the number of encoded 
packets actually transmitted is determined by the number 
of errors that occur. A number of related works e.g. [TTOl , 
ifTTIl . Ifl2l propose cryptographic schemes that can be used to 
detect and remove errors in rateless network codes, while [fl"3ll 
proposes a rateless network error correction scheme that re- 
quires cryptographic means of verifying successful decoding. 
In contrast, our work presents the first completely information- 
theoretic rateless network error correction codes. 

We design two algorithms targeting different network mod- 
els. In the first model, also studied in [8|, there is a secret 
channel between the source and the destination that is hidden 
from the adversary (who is omniscient except for the secret), 
and the rate of the channel is negligible compared to the 
network. In this case over time we incrementally send more 
linearly dependent redundancy of the source message through 
the network to combat erasures, and incrementally send more 
(linearly independent) short hashes of the message on the 
secret channel to eliminate fake information. The destination 
amasses both kinds of redundancy until he decodes success- 
fully. The code will adapt to the actual min cut of the network 
as well as the number of errors. 

The second scenario is the random secret model [9], where 
instead of a secret channel, the source and destination share a 
"small" fixed random secret that is independent of the input 
message. The amount of secrets required is again negligible 
compared to the amount of information sent. The random se- 
cret model may be more realistic than the secret channel model 
because it allows the source and destination to share their 
secrets in advance and use them for later communication over 
time. It is also possible for source and destination to share only 
a secret seed and generate pseudo random sequences with the 
seed [ 14 1 . Compared to the secret channel model, the challenge 
is that both linearly dependent and independent redundancy 
must be sent over the public and unreliable network. Again, 
our code will adapt to the network and adversary parameters. 

Both schemes are distributed with polynomial-time com- 
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plexity of design and implementation. They assume no knowl- 
edge of the topology and work in both wired and wireless 
networks. Moreover, implementation involves only slightly 
modifying the source encoder and destination decoder, while 
internal nodes use standard random linear network coding. 

II. Network Models 

A. Adversary Model 

The source Alice wishes to communicate reliably with the 
destination Bob over a general network, where there is a hid- 
den attacker Calvin who wants to disrupt the communication. 
Calvin is assumed to be able to observe all the transmissions 
over the network, and know the encoding and decoding 
schemes at Alice, Bob, as well as all other intermediate 
nodes. He is also aware of the network topology. Calvin 
can corrupt transmitted packets or inject erroneous packets. 
Finally, we assume Calvin to be computationally unbounded, 
so information-theoretic security is required in this case. 

However, we assume that Calvin's knowledge is limited 
in some aspects. Two limitation models are discussed in this 
paper. For the first model, in addition to the given network, 
we assume there is a secret channel between Alice and Bob, 
i.e., the information transmitted on this channel will not be 
observed or modified by Calvin [8|. However, the rate of 
the channel is negligible compared to the network. In the 
second model, we assume the source and destination share a 
small amount of random secret information that is independent 
with the input information [9|. Again, the amount of secret 
information required is negligible compared to the amount 
of information sent. As we will show later, the differences 
between the two models and the respective code constructions 
are substantial. 

B. Network Model 

We model the network in the general case as a hypergraph 
where nodes are vertices and hyperedges are directed from the 
transmitting nodes to the set of the receiving nodes 031 . Let 
£ be the set of hyperedges and T be the set of nodes. Alice 
and Bob are not assumed to have knowledge of the topology 
of the hypergraph. They also may not know the capacity of 
the network as well as the number of errors that the adversary 
can inject. 

Source Alice encodes her information bits into a batch of 
b packets by the encoding schemes described in subsequent 
sections. Each packet contains a sequence of n + b symbols 
from the finite field ¥ q . Let matrix Xq = ]pkx(™+ & ) b e one 
batch of packets from Alice that is desired to be communicated 
to Bob. We call the successful communication of one batch of 
information bits Xq to the destination a session. For clarity, 
we focus on one such session. In the rateless setting, because 
Alice does not know the network capacity and error patterns, a 
session may require multiple network transmissions until Bob 
receives enough redundancy to cope with errors and decode 
correctly. Assume in general that a session involves N stages, 
i.e., N uses of the network, where N is a variable. During 
the i-th stage, 1 < i < N, denote the capacity (min cut from 



Alice to Bob) of the network as Mi, and the number of errors 
(min cut from Calvin to Bob) that the adversary injects as 
Zj. We assume Zi < Mi, otherwise the network capacity is 
completely filled with errors and it is not possible to transmit 
anything. For any realistic network, Mi is always bounded. For 
example, let Cj be the number of transmission opportunities 
that occur to the source during the i-th stage, then Mi < Ci. 
For convenience we further assume Ci < c, Vi. 

III. Code Construction for Secret Channel Model 

A. Encoder 

Alice's encoder has a structure similar to the secret channel 
model in [8 1, but operates in a rateless manner. In each session 
Alice transmits nb incompressible information symbols from 
F 9 to Bob. Alice arranges them into a matrix W e ¥ q xn , 
and encodes Xq = (W lb), where If, is the identity ma- 
trix of dimension b. Then, as in fl6l . flU, Alice performs 
random linear combinations to the rows of Xq to generate 
her transmitted packets. Specifically, Alice draws a random 
matrix K\ € F^ lXb and encodes X x = KiX . Note that the 
redundant identity matrix receives the same linear transform 
so we can recover Xq. X\ is then send over a network 
where intermediate nodes implement random linear coding. In 
addition, Alice will hash the message and send it through the 
secret channel. She sets ct\ — bc\, and draws random symbols 
r%, r ai+ i independently and uniformly from ¥ q . Note that 
the {rj} are drawn secretly so that Calvin cannot observe 
them. Let D x = [d kj ] € F ("+ b )x( Q i+ 1 ) ? wher e d kj = (r 3 ) k , 
and then the hash is computed as Hi = XqD\. Finally Alice 
sends ri, r ai+ i and Hi to Bob through the secret channel. 
The size of the secret is (ai+l)(6+l), which is asymptotically 
negligible in n. 

Alice then keeps sending more redundant information to 
Bob as follows. For the z-th stage, i > 2, Alice draws 
a random matrix Ki € ¥ q iXb , encodes Xi — KiXo, and 
sends Xi over the network. In addition, Alice again draws 
ri,...,r ai randomly from ¥ q secretly, where on — bci. She 
then constructs Di = [dkj] <E ^ n+b '> xa ^ ^ — {rj) k , and 
computes Hi = XqDi. Alice eventually sends n,...,r ai 
and Hi to Bob through the secret channel. The size of the 
secret is ai(b + l), again asymptotically negligible in n. Alice 
repeats this procedure until Bob indicates decoding success. 
If a success is indicated, Alice ends the current session and 
moves onto the next session. 

B. Decoder 

The network performs a classical distributed network code 
(which is shown to suffice to achieve capacity for multicast 
|@1). Specifically, each packet transmitted by an intermediate 
node is a random linear combination of its incoming packets. 
For the i-th stage, we can describe this linear relation as 

where Yi £ Fq' fl x '™ +b ' is Bob's received observation, Zi S 
jp,z,x(n+6) j s ^ e errors m j ec t ec [ by Calvin, and Ti and Qi are 
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defined to be the transfer matrix from Alice to Bob and from 
Calvin to Bob, respectively. By stacking all the batches of 
observations received by the i-th stage, let 
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Then we have 



y(») = j"(0 q(») 



z« 



(1) 
(2) 



where (d} follows from the network transform, and (2) follows 
from the code construction. Notice that only Z)W and 

H^' are available to Bob, and he needs to recover Xo from 
equations (fl}, (f2]i. Bob can accomplish this only if Xo is in 
the row space of yW. 

Suppose Xo indeed lies in the row space of Y"' (it happens 
with high probability for some i as shown later), then there 
exists X s such that 



Xo = X S Y 



(3) 



Therefore Bob only needs to find X s , which may be achieved 
by solving 

X'YWdW =H®. (4) 

If (01 has a unique solution for X s , Bob reconstructs X ac- 
cording to (O and feedbacks an acknowledgement of decoding 
success to Alice. If there exists no solution for (0J, Bob waits 
for more redundancy to come. Otherwise, if there are multiple 
solutions for |@}, Bob declares a decoding failure. 

C. Performance 

In the following we will show that the probability of error, 
including the events that Bob declares an error, or the events 
that Xo is not in the row space of Y"\ or there exists some 
other X 1 ^ X that satisfies (|4), is vanishing as q — > oo. 

We first show that under proper conditions, Xo is in the row 
span of yW, The following lemma is well-known ifTTl : 

Lemma 1: If the linear transform rW has full column rank, 
i.e., Rank(TW)= b + J2)=i z j> men ^ i s left-invertible, and 
there exists X s such that X'Y® = X Q . 

Now we show that TM almost always has either full column 
rank or full row rank as q — >• oo. 

Lemma 2: If b + Y,)=i z j < ELi M b then ^ W has ful1 
column rank with high probabilitjo 



'Event E happens with high probability (w.h.p.) if lim<j_ >00 Pr{E} = 1. 



Proof: The proof follows an idea similar to |£Sj with 
the difference that we consider communications over multiple 
stages. Notice that f « = [T« Q(0]. Since b + Y.)=i z j < 
Ej=i -Mi> it follows 6 < 53j=i^;'' Because Tj has full 
rank and Kj are random matrices, if b < 2~2]=i Mj, then the 
probability that the columns of T"' are linearly dependent is 
upper bounded by b/q — > by the Schwartz -Zippel lemma. 
So Tw has full column rank. Without loss of generality we 
assume QW also has full column rank, otherwise we can 
select a basis of the column space of QW and reformulate 
the problem with a reduced Z{. Furthermore, by lTl8l , if 
b + Y?j=i z j — Mj, the probability that the column 

spans of T^ 1 ' and QW intersects anywhere other than in the 
zero vector is upper bounded by i 2 |T||^|q _1 for a fix adver- 
sary pattern. Since Calvin can choose his locations in at most 
(j} ' ) ways, by the union bound, the probability that TW 

and intersects is bounded by (^ £ [)i 2 \T\\£ -> 0. 
Hence has full column rank with high probability. ■ 

Corollary 1: If b + 2~2j=i z j — 2~2j=i Mj, then with high 
probability there exists a X s such that X S Y^ — Xq. 

Next we need to show the solution is unique, i.e., the hash 
is strong enough so that Bob can distill the injected error. 

Lemma 3: For any X' ^ X Q , the probability that X'D^ 
1 1 is bounded from above by ((n + b)/q)^-'' k = 1 ak+1 . 

Proof: It is equivalent to consider the probability that 
(X' - X )D® = 0. Since X' - X ^ 0, there is at least one 
row in which X 1 differs from Xq. Denote this row of X' — Xo 
as (xi, x n +b), then the j-th entries of the corresponding 
row of (X' - X )D^ is F(rj) = ££tjxferj\ Because 
F(rj) is not the zero polynomial, the probability (over rj) that 
F{rj) = is at most (n+b)/q. Because D« has XTfc=i a fc + l 
columns, and all ry, 1 < j < J2k=i a k + 1> stre independently 
chosen, the probability that the entire row is a zero vector is 
at most ((n + 6)/g)^*=i Q * : + 1 . This is an upper bound of the 
probability that the entire matrix of (X' — Xo)D^ is zero. ■ 

Lemma 4: The probability that there exists V s ^ X s such 
that V S Y^ ± Xo but VY^D^ = iJ« is upper bounded 
by (n + 6)^=i Qfc+1 jq -t 0. 

Proof: Note that the dimension of V s is 

t i 

i=i i=i 

over F g . So by invoking Lemma [3] and then take the union 
bound over all possible choices of V s , the claim follows. ■ 
Now we are ready to present the main result for the shared 
secret model. 

Theorem 1: Vi such that 6+X)j=i z j ^ 2~2]=i Mj, with the 
proposed coding scheme, Bob is able to decode X correctly 
with high probability at the i-th stage. Otherwise, Bob waits 
for more redundancy instead of decoding erroneous packets. 

Proof: By Corollary Q] we can solve Xq from © and 
if b + 2~2j—i Zj < 2~2j=i Mj. By Lemma|H if a solution exists, 
it is correct and unique. Otherwise, there is no solution to (O 
and by the algorithm Bob waits for more redundancy. ■ 
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Theorem Q] shows the code is optimal in that decoding 
succeeds with high probability whenever the total amount of 
information received by the sink satisfies the cut set bound, 
i.e., b + 2~2]=i z j ^ Mj. If the bound is not satisfied, 

then it is not possible for Bob to decode correctly under any 
coding scheme. The following result shows that the code is 
rate optimal if the network capacity and number of errors are 
i.i.d. across stages. 

Theorem 2: Assume Mi, Zi, i — 1,2... are i.i.d. random 
variables with mean E[M] and E[z], respectively. If there 
exists e > such that E[M] — E[z] > e, then with the proposed 
coding scheme, Bob is able to decode Xq correctly with high 
probability in a finite number of stages. Further, on average 
the code achieves rate 

r> 6 (E[M]-E[z]). 
o + c — 1 

Proof: Let L = b/e, and let random variable Aj = Mj — 

zj, so E[Aj] 



N >L, 
inequality, 



Eli A?] 



e and denote Var[Aj] 

Ne = b + (N - 



= a ^ < oo. Then for 
L)e. By Chebyshev's 
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_J'=1 



-(N- L)e 



> 1 - Pr 
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E A i- 

i=i 



JY 



> (N - L)e 



> 1- 



Nal 



(N - L) 2 e 2 



1 as TV — > oo. 



So with high probability there exists finite N > L such that 
6 + 13j=i z i — Eli ^Kj- And by Theorem[T] Bob is able to 
decode successfully at stage N. Next we determine the average 
rate, let 



N = min ■ 



\ ^ 



Then the average rate of the code is 

" b ~ 
N 



E 



> 



E[N]' 



(5) 



where the last inequality follows from Jensen's inequality. 
Denote S T = EJ=i Mj — Sj=i z i' tn en N is a stopping 
time for the random process S T . Therefore, by the first Wald 
identity lfl9l . 



E[7V](E[M] - E[z]) = E[S N ] 
<b + c- 



1. 



Therefore 



E[N] < 



b + c-1 
E[M] - E[z] 



Substituting into (O, it follows 



r > 



b + c-1 



(E[M] - E[z}) 



Notice that as we choose a sufficiently large b, then the rate 
of the code is approaching E[AI] — E[z], which is shown to 
be the maximal achievable rate for networks with Byzantine 
adversaries ll20l . The computational cost of design, encoding, 
and decoding is dominated by the cost of carrying out the 
matrix multiplication yW£)M in ©, which is 0(n(ic) 3 ). 
Notice that because Y"W and £>W grow reg ularly, y (*)£)(*) 
is a block of therefore careful implementation 

of the algorithm can improve complexity (though not in the 
order sense) by building on the results from the last stage. 
Assume in general that yWl)M = A T and 



) T = 


' A 


C ' 




B 


D 



Then at stage i + 1 we only need to perform the multiplications 
corresponding to blocks B, C, and D. The same trick applies 
when we are to perform row reduction on yw£)W. Suppose 
at the i-th stage we have already reduced A into row echelon 
form with matrix R, i.e., with high probability it follows 
(otherwise there is a decoding error) 



RA = A! 



I 




At the (i + l)-th stage we want to reduce (y( i+1 )D( i+1 )) 
into row echelon form based on the knowledge of R. We can 
construct the row operations as the following steps: i) multiply 
R to reduce block A and obtain [A' RC] in the upper blocks; 
ii) use A' to cancel block B to zero; iii) perform row reduction 
on the lower right block corresponding to D; iv) use the row 
reduced lower right block to cancel the upper right block to 
zero. Formally, let 



Rl 



I RC\0 
I 



I 

D- 



I 

B\0 



R 
I 



where D~ is the matrix that row reduces [B\0]RC + D, where 
the after B is defined as a zero padding sub-matrix of 
appropriate dimension. Then it follows 



I 












I 





Finally we only need to permute the rows to place the identity 
matrix on top. Note that by this algorithm at every stage 
we only need to perform row reduction on a small block 
(corresponding to D) plus several multiplications. 

IV. Code Construction for Random Secret Model 

In this section we consider the case that a secret channel is 
not available between Alice and Bob, instead they only share 
a "small" random secrets whose size is negligible compared to 
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the amount of information sent. The random secret model is 
similar to the previously discussed secret channel model with 
the difference that the secret information should be random 
and independent with the source message Xq. Therefore in 
this case Alice cannot send the hash of Xq to Bob secretly 
and reliably. Nutman and Langberg [9| modifies the scheme 
in J8 | against Byzantine adversaries under the secret channel 
model so that it also works for the random secret model. The 
essential idea is to prefix Xq with a small matrix L obtained 
by solving the hash relation (L X )D = H, where both D 
and H are the fixed random secrets. Then (L Xq) is sent 
through the network as the new input message. However, there 
is no obvious way to make this scheme rateless, because in 
the rateless setting, the number of columns of D and H will 
grow over time as more linearly independent redundancy is 
needed. But this implies it is not possible to uniquely solve L 
given a fixed Xq. 

The difficulty in solving L reveals the restriction of defining 
the hash relation in the matrix form of XqD = H. Hence a 
more flexible way for hashing the data is required as discussed 
in the following. Recall that the vectorization of a matrix is a 
linear transformation which converts the matrix into a column 
vector by stacking the columns of the matrix on top of one 
another. Let column vector w 6 F^™ be the vectorizec0 W 
(recall that W is the matrix of raw incompressible source data 
before attaching the identity matrix, defined in Section [Tll-Ab . 

To generate the hashes, i.e., the linearly independent re- 
dundancy that is transmitted at the fc-th stage. We first draw 
a/,, symbols from the random shared secrets as \ 
d^u} S F g , and use them to construct the ak X nb parity check 



where d. 



(k) 



VI < i < a>k, 



matrix - 

1 < j < nb. 

Then we draw another ak symbols from the random shared 



secrets as n\ . 



(k) 



., h a ^ and use them to construct the hash 



of the message. Let hk — (h^ , halY ■ Then the following 
parity check relation is enforced: 

[Dk la J 



(fc)yr 



w 
Ik 



(6) 



where I ak is the identity matrix of dimension ak and Ik is a 
column vector of length au that can be solved uniquely from 
© as 

Ik = h k - D k w 

Note that {Ik} does not need to be kept secret. Now we can 
readily construct a rateless parity check scheme based on ©: 



(7) 



2 Compared to linear transforms on the matrix W, linear transforms on 
the vector w are more general (every operation in the former class can be 
representation by an operation in the latter class, but not vice versa) 



" D 1 







.. 




w 




" hi " 


D 2 





I a2 


.. 




h 




. A 








■• 4, . 








hi 



i.e., the total number of parity checks J2i a * can g row over 
time if necessary. 

(0 implies that we are hashing (w T l T ) T instead of the 
message w itself. However, the advantage of introducing Ik 
is that by attaching a short suffix to w, we can establish a 
virtual secret channel between Alice and Bob. Nevertheless, 
the challenge is then we need to send not only w but also 
{Ik} over the unreliable network publicly. In the following 
we will discuss the structure of the encoder and decoder for 
transmitting them successfully. 

A. Encoder 

In order for Bob to decode successfully, both linearly 
dependent redundancy and linearly independent redundancy 
are required. Redundant information that lies in the row space 
of Xq is called linearly dependent redundancy and is used to 
combat erasures in the network (deletion from the row space 
of Xq). Other redundant information is linearly independent 
redundancy and is used to distill "fake information" that 
adversaries inject into the network (addition to the row space 
of Xq). In the case of the secret channel model, we send 
linearly dependent redundancy {Xi} in the network and send 
linearly independent redundancy {D,} and {Hi} on the secret 
channel. Compared to that, in the case of the random secret 
model, a secret channel is not available and both linearly 
dependent and independent redundancy must be sent through 
the public and unreliable network. 

However, notice that the linearly dependent redundancy cor- 
responds to long messages that are usually arranged into long 
packets, while the linearly independent redundancy is short. Its 
size is chosen to be independent with n, as it is desired that 
the amount of random secrets required is negligible comparing 
to the amount of information sent. Therefore, it is convenient 
to encode and send the two kinds of redundancy separately 
(for example, it would be wasteful of resources if linearly 
independent redundancy is sent in normal packets because 
it is too short to fill a packet) as long packets and short 
packets, respectively. We define Mi, Zi, Ci, c for long packets 
as described in Section|II] For short packets, denote Mi, Ci and 
Zi as the min cut from Alice to Bob, the number of available 
transmission opportunities, and the min cut from Calvin to Bob 
at stage i, respectively. Similarly we assume I, < Mi, Vi. 

We first discuss the encoding scheme for linearly dependent 
redundancy. The source input message is arranged as a b x n 
matrix W. Then we encode Xq = (W lb). At the i-th stage, 
Alice draws a random matrix K\ G F^ xb , and encodes the 
long packets Xi = KiXQ. 

To generate the linearly independent redundancy, at stage i 
Alice sets = iam (the choices of a and m are discussed in 
the next paragraph), solves li according to ©, and arranges 
the column vector into a a x im matrix Ci. Then she let 
Lj = (Cj Op Oj I a ), 1 < j < i, where is a zero 
matrix of size a x (i — j)m, and Oj is the zero matrix of size 
a x (j — l)er. 0^ is dummy and is used to align C, and 0^ is 
used to align the identity matrix. Alice then draws a random 
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matrix Gi of size c, x ia and encodes the short packets as 
Lt ... C 



A* = d 



U 



= GiL®. 



Bob deals with in a similar way. Let be the rank 
of jW, G F^ X4 °" be the last icr columns of jW, and 
T"« G Fj x(f *J^ be the first n-icr columns of jW. Then 
w.h.p. [T"M TV)] consists a basis of the column space of 
JW, and we can write 



In order to eliminate the "fake" information injected by the 
adversaries, Alice should introduce an adequate amount of 
linearly independent redundancy, i.e., choosing appro- 
priately. Alice may choose any a such that a < Mi — z~i, Vi 
(e.g., a = 1 is a safe choice). She then chooses m such 
that am > 26c + 2a c + 1. Note that the size of the secret, 
i(i + \)am/2, is again negligible in n. 

Finally, at the z-th stage Alice sends X% as long packets 
with packet length n + b, and A; as short packets with packet 
length i(m+a). Alice repeats this procedure until Bob decodes 
successfully. 

B. Decoder 

At the z-th stage, Bob receives long packets Y{ and short 
packets Jf. 



_ ]mll{i) j>(i)l 



Ifi —i 





F E 



(11) 



where F E and F A are matrices of coefficients. 

Equations ( TTOb and ( fTTT > characterize the relationship be- 
tween the received observations and the input messages due 
to the effect of the network transform. In order to decode 
successfully, Bob needs to take into account the built-in 
redundancy of the message, i.e., the relation between w and 
{li}, as follows. Vi, split Xq and LW as: 



X 



(12) 
(13) 



where T t G Ff* xc ' 



Yi — TiXi + QiZi, 
J% = TiAi + QiEi, 

?Mi x 5 



(8) 
(9) 



where Xa^ are the first r$ — 6 columns of Xo, Xc' ; are the last 
b columns of Xq, and are the remaining columns in the 
middle; L$ are the first f, — i<7 columns of iW, are the 
last ia columns of and Li/ are the remaining columns in 



between Alice and Bob, Q; G F* /lX2 * 



Tj G F* iiXCi are the transfer matrices 



G Ff* x2 > are 



of xf, xjp and Let l!^ and be the vectorized 



.(») 



(0 ,« 



the transfer matrices between Calvin and Bob, and Z. t G the middle. Let x ( a l) , and be the vectorized versions 

are the errors injected to long 
packets and short packets, respectively. 

Bob then stacks the long and short packets that he has 
received so far to get 



versions of La\ 



and L y c L> omitting the dummy Op. By 



construction it follows that, 



Ji 



Ji 

where dummy matrix On is padded to {</&} in the same way 
as it is padded to {L^}. 

Bob evaluates the rank of Y"W, and waits to receive more 
packets if ^ =Rank(yW) < b. When > b, Bob tries 
to decode. Without loss of generality we assume the rows 
of Y"' are linearly independent. Otherwise, Bob selects Ti 
linearly independent rows from Y^' and proceeds similarly. 
He then picks a basis for the column space of Y"W. As will be 
shown later, the last b columns of Y^' (corresponding to the 
identity matrix in Xo) are linearly independent w.h.p., so they 
are chosen and is denoted by a r, x b matrix TW. Without 
loss of generality (by permuting the columns if necessary) we 
assume that the remaining r,i — b linearly independent columns 
correspond to the first — b columns of 7«, denoted by a 
Ti x (r-j — b) matrix T"^>. So we can expand Y^> with respect 
to this basis as 

F z 
F x 



r t (i) ' 




w 


(i) 
X b 




h 








L % 







(14) 



Note that x£ and lc ' are left out because they correspond to 
the redundant identity matrix. Now Bob constructs two matrix 
B top and B m id as defined in dl~5T > and ( [Tol l, respectively. Here 
jfj and ffi are the (i,j) th entries of matrix F z and F E , 
respectively, and /? = n + b — r,, 7 = i(m + a) — fi. He then 
deletes all columns in B m id corresponding to the positions of 
the dummy zero padding when vectoring L", and obtain a 
submatrix B' mid . Finally, Bob let 



Bbot = 



D 1 I ai 
D 2 I a2 



D, 








y(0 = 




h 



(10) 



where F z and F x are matrices of coefficients. 



Notice that if Bob permutes the columns of Y^ and jW 
when constructing T"' ! ^ and f"W, then he needs to permute 
the columns of _B;, of accordingly. Then he tries to solve the 



7 





_ f Z f(i) 
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-A z 2 ^« . 


• -/£ 


-6,2^ 
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_ f Z f( i) 

± :f- J 


-AV W • 
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jp(i) 






-/ 2 ,i TW ■■ 


_c A 

• -/*- 
















-A B 2 r« .. 


j n- 


t(») 

-i<r, 2 J 
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. -fi E J {t) 




j n - 










jr(i) 



(15) 



(16) 



equations: 



where 



(*) 

1 a 



j>(i) 
hi 



(17) 



where / A , / A are the vectorized versions of FA FA respec- 
tively, T« = diag[fW,...,fW], fW = diag[AA f «], 
and the matrix i? is defined as: 



top 







o s' 



Bbot 



Bob tries to solve ( TPTT i and, if there exists no solution, 
he waits for more redundancy from Alice and tries to solve 
it again at the next stage. If there is a unique solution to 
(JT7J, then Bob has decoded successfully with high probability. 
Otherwise, if there are multiple solutions, Bob declares a 
decoding failure. 

C. Performance 

In this section we show the proposed scheme will succeed 
with high probability and achieve the optimal rate. Our first 
step is to establish ( fTOb and ( fTTT >. We first consider the short 
packets. Note that ( fTTT > is shown by Lemma below. 

Lemma 5: fW has full column rank with high probability. 
Proof: For notational convenience we define 





" Ti 
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. 




" A 1 ... 










A ■ 
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, A« = 


A 2 ... 
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■ Ti _ 




[ A t 






" Qi 





.. 




'Ei .. 


. 


Q« = 





Q 2 • 


.. 


, F« = 


E 2 .. 


. 










.. Qi 




Ei 



Then we have the concise relationship from dHJ and (|9): 

= f(i) G (i) L (i) + Q(i) E (i) i (18) 



G {i) = 



Gi ... 
... 



G 2 



Gi 



By construction, ^A=i ^A — 5A=i A' — * <J - By an arg ument 
identical to Lemma |2] this implies that with high probability 
T"w and both have full column rank and span disjoint 
column spaces (except for the zero vector). Random matrix 
GW also has full rank w.h.p. and therefore T«G« and QW 
also both have full column rank and span disjoint column 
spaces. We can write the last ia columns in ( fTST l corresponding 
to the redundant identity matrix in L" as 

f (<) = f + Q^E^l, (19) 

where £^2- are the last ia columns of A A Hence the columns 
of t W are linearly independent w.h.p.. ■ 
A similar argument also holds for the long packets: 

Lemma 6: If J2]=i Mj ~ Ej=i *j > b, then f W has full 
column rank with high probability. 

Proof: Consider the last b columns in ([TJ corresponding 
to the redundant identity matrix in X$ 

And by Lemma (0, TW and QW both have full column rank 
and span disjoint column spaces (except for the zero vector). 
Hence the columns of T"W are linearly independent. ■ 

Now we are ready to analyze the key equation (fTTt . We first 
prove a related lemma. 

Lemma 7: With high probability (0 and ( fTTT i are equivalent 
to the following equation: 

f (i) L W = f (i)( F A + (2Q) 

Proof: We use a technique similar to 0. Substituting 
<Q3D to CE8), it follows 

= £ (i) L (i) + q(0(s(0 _ £$0£W). 

Then by (fTTT i we have: 

fW^W + qW(^(*) - L«) = 

fW[0 F A J to ] +T"W[J f , 1 _ to F E 0]. (21) 



s 



Therefore the columns of T"W are spanned by the columns 
of [T"W Q^]- So there exists matrices V\ and V2 such that 

f"(i) = fWvi + gWy 2 . 
And we can rewrite (|2H as 

f«[0 F A I ia ] + {T^V X + QW7 2 )[J f( _to F £ 0]. (22) 

However, notice from ( fT9l that the columns of TW and the 
columns of Q^' are w.h.p. linearly independent, i.e., the 
column spaces of f W and QW are disjoint except for the zero 
vector. Hence ( l22l implies the following set of equations: 

f(i) L (i) = f(i)[ Q ^ + F s o] ( 23) 

qW(£« _ = GWV&fo-to F £ 0]. (24) 



Here (1231 suffices for the purpose of decoding Xq. We split 
( |23l into three parts as in ( TTJl l, and get: 



(25) 
(26) 
(27) 



By Lemma [5] I 7 '*) has full column rank and is left-invertible, 
therefore by (|25j it follows L a l) = V%. Substituting it into d26j 

= T (i) F A + T (i) p E _ 

Finally, notice that by construction is the identity matrix 
and therefore (|27T > is redundant. Hence we can conclude that 
(0 and ( fTTT i are equivalent to 



By a similar argument, for long packets we have the 
following result: 

Lemma 8: If 2~2j=i Mj — Ylj=i z i — b< tnen w i tn high 
probability ([8]l and (TTOb are equivalent to 



jt(<) x W = f + X ^F Z ). 



(28) 



Corollary 2: If 53j=i -^j ~ Sj=i z i — b< tnen tne mau "i x 
equation ( fTTI i holds with high probability. 

Proof: Notice that (fTTT i are equivalent to the following set 
of three matrix equations: 



B 



top 



■mid 



B, 



bol 



w 

;W 

(i) 
a 

(i) 
6 

W 

(0 



X 



X 



t> a 



f(i) fX 

h x 

hi 



(29) 
(30) 

(31) 



But now note that d29l ) is equivalent to d28l i; ( f3TT > is equivalent 
to 0; and ( 130b is equivalent to (f20b because all the deleted 
columns correspond to the zero padding in LW. ■ 

Finally we need to prove that (fTTT i has a unique solution, 
i.e., the probability of decoding an error packet is vanishing: 

Lemma 9: If am > 2bc+2ac+l, then with high probability 
there does not exist X 1 / X such that X' satisfies dTTb . 

Proof: Suppose X' ^ Xq, and let x' a , x' b be its vectorized 
versions as described in (fT2l . We consider the probability that 
there exist x' a , x' b , l' a and l' b that satisfy ( fTTb . Let us first 
consider the top /3rj + 77^ rows in £? that correspond to the 



blocks of B top and B' mid 



B, 



I op 



T3I 

mid 



X 



X 



l(i) 
a 

/(*) 



b 

% 

/(*) 



,'(*) 



They are equivalent to 

L 



b 

/(i) 
6 



F- 4 



j.(z) ^-x 
j<(i)fA 



-X'^F Z 

L l{i) p E 



(32) 



(33) 
(34) 



Therefore given arbitrary values of x'JP and Zq , there are 
unique corresponding values of x'S 1 ' and that satisfy ( f32l i. 

Now given any Xa and Za ' (and the corresponding x' b 
and Z^ l ') such that ( 1321 ) holds, we consider the probability that 
the bottom X)l=i a k — (* 2 + i)crm/2 rows in ( fTTI i also holds: 



B 



bol 



X 



X 



,(i) 
a 

i(t) 



b 

*(i 
/(i) 



i'(<) 
•■a 



hi 



(35) 



This is equivalent to: 



B, 



bol 



be the zero vector. Denote 



r -w 


>(i) - 






(*) 

»« 


l'(0 


= 0, 


(36) 






;W 

L l fc 


7'(i) 






(*) 


xl, and a;j; — 


' cannot both 



X 



X 



•-a 



'(<) _ 



a; 



7'(i) 



"6 '6 

where 9„ = 6(r^ — 6), 61, = /3b and 0/ 



x b,6b> 

= (i 2 + i)am/2. Denote 
the (u,v) entry of B\> ot as s u , v , then the j-th row of (f36b is 

b(n-b) pb 

^ X a,k S 3,k + ^2 X bi s 3,k+b(r^b) 
k=l k=l 

(i 2 +i)am/2 

+ £ ^ ) s J -, fe +„b = (37) 
fe=i 
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Let Sj be the (j, 1) entry of Bt, ot before column permutation, 
then s.j t k = s^ k \ 1 < k < nb, where ir is a permutation of 
{1, nb}. So ( |37t is a non-zero polynomial of order at most 
b(ri — b) + f3b = nb in variable Sj (the {sj y k+nb} are constants 
or 1 by construction and are independent with respect to Sj). 
By the fundamental theorem of algebra the polynomial have 
at most nb roots. And the probability that Sj is chosen as 
one of the roots is at most nb/q, and this is the upper bound 
of the probability that row j holds in A3 61 . Because {sj} are 
chosen independently, (l36i l holds with probability no larger 
than {nb/q)^ +is > am l 2 . 

Finally, there are at most q b ( r i-b) different Xa and at most 
q i<T{n-i*) di f ferent B y ®, n - b < ic, and by d9}, n - 
ia < ic. Therefore by the union bound, the probability that 
there exists X' Q ^ Xq such that x' a , x' b , l' a and l' b satisfy (fTTT i 
is at most 

■ 

We are ready to present the final conclusion. 

Theorem 3: Mi such that &+X)j=i z j — -^0> w i tn tne 

proposed coding scheme, Bob is able to decode Xq correctly 
with high probability at the z-th stage. Otherwise, Bob waits 
for more redundancy instead of decoding erroneous packets. 

Proof: By Corollary [2] Xq can be solved from (TTTb if 
b + Y?j=i z j — Mr By Lemma[9] if a solution exists, 

it is correct and unique. Otherwise, there is no solution to ([PTT i 
and by the algorithm Bob waits for more redundancy. ■ 

Similar to the case of the secret channel model, Theorem 
[3] shows that our code is optimal in that sense that decoding 
succeeds with high probability whenever the total amount of 
information received by the sink satisfies the cut set bound 
with respect to the amount of message and error information. 
We can also show rate-optimality under the i.i.d. case. 

Theorem 4: Assume Mi, Zi, i = 1,2... are i.i.d. random 
variables with mean E[M] and E[z], respectively. If there 
exists e > such that E[M ] — E[z] > e, then with the proposed 
coding scheme Bob is able to decode Xo correctly with high 
probability. And on average the code achieves rate 

r> 6 (E[M] - E[z]) . 
b + c — 1 

Proof: Note that both long packets and short packets are 
sent over the network. We consider the short packets to be 
overhead. At the i-th stage, the length of a short packet is 
i(m + a), and is negligible as a large enough n is chosen. The 
rest of the proof is identical to the proof of Theorem |2] ■ 
Again the proposed scheme is asymptotically rate optimal 
as we choose a large enough b. The computational cost of 
design, encoding, and decoding is dominated by the cost of 
solving dTTb . which equals 0((nic) 3 ). 

V. Conclusion 

This paper introduces information-theoretical rateless re- 
silient network codes against Byzantine adversaries. Unlike 



previous works, knowledge about the network and adversaries 
are not required and the codes will adapt to their parameters by 
sending more redundancy over time if necessary. We present 
two algorithms targeting two network models. The first model 
assumes there is a low-rate secret channel between the source 
and the destination. The second model assumes the source 
and destination share some "small" random secrets that are 
independent with the input information. For both models our 
codes are rate-optimal, distributed, polynomial-time, work on 
general topology, and only require source and destination 
nodes to be modified. 
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